Unsupervised Improvement of Named Entity Extraction in Short Informal Context Using Disambiguation Clues
نویسندگان
چکیده
Short context messages (like tweets and SMS’s) are a potentially rich source of continuously and instantly updated information. Shortness and informality of such messages are challenges for Natural Language Processing tasks. Most efforts done in this direction rely on machine learning techniques which are expensive in terms of data collection and training. In this paper we present an unsupervised Semantic Web-driven approach to improve the extraction process by using clues from the disambiguation process. For extraction we used a simple Knowledge-Base matching technique combined with a clustering-based approach for disambiguation. Experimental results on a self-collected set of tweets (as an example of short context messages) show improvement in extraction results when using unsupervised feedback from the disambiguation process.
منابع مشابه
Semantic Web and Information Extraction SWAIE 2012 Workshop in conjunction with the 18 th International Conference on Knowledge Engineering and Knowledge Management
Short context messages (like tweets and SMS’s) are a potentially rich source of continuously and instantly updated information. Shortness and informality of such messages are challenges for Natural Language Processing tasks. Most efforts done in this direction rely on machine learning techniques which are expensive in terms of data collection and training. In this paper we present an unsupervis...
متن کاملNEED4Tweet: A Twitterbot for Tweets Named Entity Extraction and Disambiguation
In this demo paper, we present NEED4Tweet, a Twitterbot for named entity extraction (NEE) and disambiguation (NED) for Tweets. The straightforward application of state-of-the-art extraction and disambiguation approaches on informal text widely used in Tweets, typically results in significantly degraded performance due to the lack of formal structure; the lack of sufficient context required; and...
متن کاملAn Adaptive Approach to Named Entity Extraction for Meeting Applications
Named entity extraction has been intensively investigated in the past several years. Both statistical approaches and rule-based approaches have achieved satisfactory performance for regular written/spoken language. However when applied to highly informal or ungrammatical languages, e.g., meeting languages, because of the many mismatches in language genre, the performance of existing methods dec...
متن کاملImprovement of Chemical Named Entity Recognition through Sentence-based Random Under-sampling and Classifier Combination
Chemical Named Entity Recognition (NER) is the basic step for consequent information extraction tasks such as named entity resolution, drug-drug interaction discovery, extraction of the names of the molecules and their properties. Improvement in the performance of such systems may affects the quality of the subsequent tasks. Chemical text from which data for named entity recognition is extracte...
متن کاملKnowledge-based Approach for Event Extraction from Arabic Tweets
Tweets provide a continuous update on current events. However, Tweets are short, personalized and noisy, thus raises more challenges for event extraction and representation. Extracting events out of Arabic tweets is a new research domain where few examples – if any – of previous work can be found. This paper describes a knowledge-based approach for fostering event extraction out of Arabic tweet...
متن کامل